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Abstract. Mark correlations provide a systematic approach to look at objects both 
distributed in space and bearing intrinsic information, for instance on physical prop- 
erties. The interplay of the objects' properties (marks) with the spatial clustering is 
of vivid interest for many applications; are, e.g., galaxies with high luminosities more 
strongly clustered than dim ones? Do neighbored pores in a sandstone have similar 
sizes? How does the shape of impact craters on a planet depend on the geological sur- 
face properties? In this article, we give an introduction into the appropriate mathemat- 
ical framework to deal with such questions, i.e. the theory of marked point processes. 
After having clarified the notion of segregation effects, we define universal test quanti- 
ties applicable to realizations of a marked point processes. We show their power using 
concrete data sets in analyzing the luminosity- dependence of the galaxy clustering, the 
alignment of dark matter halos in gravitational JV-body simulations, the morphology- 
and diameter-dependence of the Martian crater distribution and the size correlations 
of pores in sandstone. In order to understand our data in more detail, we discuss the 
Boolean depletion model, the random field model and the Cox random field model. 
The first model describes depletion effects in the distribution of Martian craters and 
pores in sandstone, whereas the last one accounts at least qualitatively for the observed 
luminosity-dependence of the galaxy clustering. 



1.1 Marked point sets 

Observations of spatial patterns at various length scales frequently are the only 
point where the physical world meets theoretical models. In many cases these 
patterns consist of a number of comparable objects distributed in space such 
as pores in a sandstone, or craters on the surface of a planet. Another example 
is given in Figure [O], where we display the galaxy distribution as traced by a 
recent galaxy catalogue. The galaxies are represented as circles centered at their 
positions, whereas the size of the circles mirrors the luminosity of a galaxy. In 
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order to test to which extent theoretical predictions fit the empirically found 
structures of that type, one has to rely on quantitative measures describing the 
physical information. Since theoretical models mostly do not try to explain the 
structures individually, but rather predict some of their generic properties, one 
has to adopt a statistical point of view and to interpret the data as a realiza- 
tion of a random process. In a first step one often confines oneself to the spatial 
distribution of the objects constituting the patterns and investigates their clus- 
tering thereby thinking of it as a realization of a point process. Assuming that 
perspective, however, one neglects a possible linkage between the spatial clus- 
tering and the intrinsic properties of the objects. For instance, there are strong 
indications that the clustering of galaxies depends on their luminosity as well as 
on their morphological type. Considering Figure 1.1, one might infer that lumi- 
nous galaxies are more strongly correlated than dim ones. Effects like that are 
referred to as mark segregation and provide insight into the generation and inter- 
actions of, e.g., galaxies or other objects under consideration. The appropriate 
statistical framework to describe the relation between the spatial distribution 
of physical objects and their inner properties are marked point processes, where 




Fig. 1.1. The galaxy distribution as traced by the Southern Sky Redshift Survey 2 
(SSRS 2). We show a part of the sample investigated, projected down into two di- 
mensions. Each circle represents a galax y, its radius is proportional to the galaxy's 
luminosity. For further details see Section 1.2.1. 



discrete, scalar-, or vector-valued marks are attached to the random points. 
In this contribution we outline how to describe marked point processes; along 



1 Mark correlations: relating physical properties to spatial distributions 



3 



that line wc discuss two notions of independence (Section 1.1) and define corre- 
sponding statistics that allow us to quantify possible dependencies. After having 
shown that some empirical data sets show significant signals of mark segregation 
(Section|L§), we turn to analytical models, both motivated by mathematical and 



physical considerations (Section 1.3). 



Contact distribution functions as presented in the contribution by D. Hug et al. 
in this volume are an alternative technique to measure and statistically quan- 
tify distances which finally can be used to relate physical properties to spatial 
structures. Mark correlation functions are useful to quantify molecular orienta- 
tions in liquid crystals (see the contribution by F. Schmid and N. H. Phuong in 
this volume) or in self-assembling amphiphilic systems (see the contribution by 
U. S. Schwarz and G. Gomppcr in this volume). But also to study anisotropies 
in composite or porous materials, which are essential for elastic and transport 
properties (see the contributions by D. Jculin, C. Arns et al. and H.-J. Vogel in 
this volume), mark correlations may be relevant. 



1.1.1 The framework 



The empirical data - the positions x* of some objects together with their intrin- 
sic properties rrij - are interpreted as a realization of a marked point process 
{(xj, m,i)}f =l (Stoyan, Kendall and Mecke, 1995). For simplicity we restrict our- 
selves to homogeneous and isotropic processes. 

The hierarchy of joint probability densities provides a suitable tool to describe 
the stochastic properties of a marked point process. Thus, let gf M ((x, to)) de- 
note the probability density of finding a point at x with a mark to. For a homoge- 
neous process this splits into gf ((x, m)) = gM.\(m) where g denotes the mean 
number density of points in space and .Mi (to) is the probability density of find- 
ing the mark m on an arbitrary point. Later on we need moments of this mark 
distribution; for real-valued marks the fcth-moment of the mark-distribution is 
defined as 

m k = J dm Mi(m)m k ; (1.1) 
the mark variance is u 2 M = to 2 — to 2 . 

Accordingly, g^ M ((xi,toi), (x 2 ,to 2 )) quantifies the probability density to find 
two points at Xi and x 2 with marks m\ and to 2 , respectively (for second-order 
theory of marked point processes see [|8]]6(J). It effectively depends only on 
toi, to 2 , and the pair separation r = |x 2 — xi| for a homogeneous and isotropic 
process. Two-point properties certainly are the simplest non-trivial quantities for 
homogeneous random processes, but it may be necessary to move on to higher 
correlations in order to discriminate between certain models. 
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1.1.2 Two notions of independence 

In the following we will discuss two notions of independence, which may arise 
for marked point patterns. For this, consider two Renaissance families, call them 
the Sforza and the Gonzaga. They used to build castles spread out more or less 
homogeneously over Italy. In order to describe this example in terms of a marked 
point process, we consider the locations of the castles as points on a map of Italy, 
and treat a castle's owner as a discrete mark, S and G, respectively. There are 
many ways how the castles can be built and related to each other. 

Independent sub-point processes: For example, the Sforza may build their castles 
regardless of the Gonzaga castles. In that case the probability of finding a Sforza 
castle at xi and a Gonzaga castle at x 2 factorizes into two one-point probabilities 
and we can think of the Sforza and the Gonzaga castles as uncorrelated sub-point 
processes. In the language of marked point processes this means, e.g., that 



for any roi ^ ro 2 . If all the joint n-point densities factorize into a product 
of n'-point densities of one type each, then we speak of independent sub-point 
processes. Dependent sub-point processes indicate interactions between points 
of different marks; for instance, the Gonzaga may build their castles close to 
the Sforza ones in order to avoid that a region becomes dominated by the other 
family's castles. 

Mark-independent clustering: A second type of independence refers to the ques- 
tion whether the different families have different styles to plan their castles. For 
instance, the Gonzaga may distribute their castles in a grid-like manner over 
Italy, whereas the Sforza may incline to build a second castle close to each cas- 
tle they own. Rather than asking whether two sub-point processes (namely the 
Gonzaga and the Sforza castles, respectively) are independent ("independent 
sub-point processes"), we are now discussing whether they are different as re- 
gards their statistical clustering properties. Any such difference means that the 
clustering depends on the intrinsic mark of a point. 

Whenever the two-point probability density of finding two objects at Xi and x 2 
depends on the objects' intrinsic properties we speak of mark- dependent clus- 
tering. It is useful to rephrase this statement by using Bayes' theorem and the 
conditional mark probability density 



gf M ((Xi.ro!), (x 2 ,ro 2 )) = gf M (( Xl ,mi)) gf M ((x 2 ,ro 2 )) 

= g 2 Mi{m 1 )Mi(m 2 ), 



(1.2) 



A4 2 (roi,ro 2 |xi,x 2 ) = 



Q2 M (( x i,rai), (x 2 ,ro 2 )) 

02 ( x l> x 2) 



(1.3) 



in case the spatial product density (•) does not vanish. _M 2 (mi, ro 2 |xi, x 2 ) is 
the probability density of finding the marks roi and m 2 on objects located at Xi 
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and X2, given that there are objects at these points. Clearly, M. 2 (m\, m2|xi,X2) 
depends only on the pair separation r = |xi — X2I for homogeneous and isotropic 
point processes. We speak of mark-independent clustering, if M.2{m\, m 2 \r) fac- 
torizes 

M 2 {m l ,m 2 \r) = Mi{m 1 )Mi{m 2 ) (1.4) 

and thus does not depend on the pair separation. That means that regarding 
their marks, pairs with a separation r are not different from any other pairs. On 
the contrary, mark-dependent clustering or mark segregation implies that the 
marks on certain pairs show deviations from the global mark distribution. 
In order to distinguish between both sorts of independencies, let us consider the 
case where we are given a map of Italy only showing the Gonzaga castles. If the 
distribution of castles in Italy can be understood as consisting of independent 
sub-point processes, we cannot infer anything about the Sforza castles from the 
Gonzaga ones. However, if gf ' M ((xi, S), (x 2 , G)) > g 2 Mi(S)Mi(G), Sforza 
castles are likely to be found close to Gonzaga ones. Here, Mi(S) and M\{G) are 
the probabilities that a castle belongs to the Sforza or Gonzaga family. If, on the 
other hand, mark-independent clustering applies, typical clustering properties 
such as the spatial clustering strength are equal for both castle distributions, 
and the Gonzaga castles are in the statistical sense already representative of the 
whole castle distribution in Italy. That means in particular that, if the Gonzaga 
castles are clustered, so are the Sforza ones. 

Before we turn to applications, we have to develop practical test quantities in 
order to test for segregation effects in real data and to describe them in more 
detail. 



1.1.3 Investigating the independence of sub-point processes 

To investigate correlations between sub-point processes, suitably extended near- 
est neighbor distribution functions or iC-functions have been employe d Jl6|p0| ]. 



Also the (conditional) cross-correlation functions can be used (see Eq. 1.8), for 
a further test see |Q, p. 302. Here we consider a multivariate extension of the 
J- function Q, as suggested by p9| . 

For this, consider the nearest neighbor's distance distribution from an object 
with mark rrii to other objects with mark rrij, Gij{r) ("z to j", for details see 
[p9j). Let Gio{r) denote the distribution of the nearest neighbor's distance from 
an object of type i to any other object (denoted by o). Finally, G 00 {r) is the 
nearest neighbor distribution of all points. Similar extensions of the empty space 
function are possible, too. Let Fi(r) denote the distribution of the nearest i- 
object's distance from an arbitrary position, whereas F (r) is the nearest ob- 
ject's distance distribution from a random point in space to any object in the 
sample. We consider the following quantities: 



M ' ~ 1 - Fjir) ' %o[ > ~ 1 - F„(r) ' [ ' ~ 1 - F (r) ' 
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They are defined whenever Fj(r),F (r) < 1. If two sub-point processes, defined 
by marks i ^ j, are independent then one gets B9| 



J«(r) = l. 



(1.6) 



Note, that the Jy depend on higher-order correlations functions, similar to the 
J- function |35 . Suitable estimators for these J-functions are derived from esti- 



mators of the F and G-functions 



1.1.4 Investigating mark segregation 

In order to quantify the mark-dependent clustering or to look for the mark 
segregation, it proves useful to integrate the conditional probability density 
M.2(mi, m2\r) over the marks weighting with a test function /(mi, 7712) J5^j5^] . 
This procedure reduces the number of variables and leaves us with the weighted 
pair average: 

(/)p = J dm 1 J dm 2 f(mi,m 2 )M 2 (mi,m 2 \r). (1.7) 

The choice of an appropriate weight-function depends on whether the marks are 
non-quantitative labels or continuous physical quantities. 

1. For labels only combinations of indicator- functions are possible, the integral 
degenerates into a sum over the labels. Supposed the marks of our objects 
belong to classes labelled with the conditional cross-correlation func- 

tions are given by 

+ (1 - 5i )8 m2i 8 mi j) v (r), (1.8) 

with the Kronecker 6 mi i = 1 for mi = i and zero otherwise. Mark segregation 
is indicated by dj 7^ ^QiQjl Q 2 for i ^ j and Cu ^ Q 2 /q 2 , where Qi denotes 
the number density of points with label i. The CV, are cross-correlation 
functions under the condition that two points are separated by a distance 
of r (compare p0{, p. 264, for applications see the Martian crater distribution 
studied in Sect. 1.2.3| and Figure 1.7 in particular). 



For positive real-valued marks m, the following pair averages prove to be 
powerful and distinctive [5ll|7||: 

(a) One of the most simplest weights to be used is the mean mark: 

Mr) - ^V- 1 ^ (1.9) 

quantifies the deviation of the mean mark on pairs with separation r 
from the overall mean mark m. A k m > 1 indicates mark segregation 
for point pairs with a separation r, specifically their mean mark is then 
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larger than the overall mark average. 

Closely related is Stoyan's k mm function using the squared geometric 
mean of the marks as a weight |55| , |6C| 

k mm (r) - ( ™ Mr) . (1.10) 
m 

(b) Accordingly, higher moments of the marks may be used to quantify mark 
segregation, like the mark fluctuations 

var(r) = ((mj - (mi) P (r)) 2 ) p (r), (1.11) 

or the mark-variogram |7C|j6l| : 

7 (r) = (i(m 1 -m 2 ) 2 ) p (r), (1.12) 

(c) The mark covariancc jl7| is 

cov(r) = (mirn 2 )p (r) - (mi) P (r) {m 2 ) v (r). (1-13) 

Mark segregation can be detected by looking whether cov(r) differs from 
zero. A cov(r) larger than zero, e.g., indicates that points with separa- 
tion r tend to have similar marks. Sometimes the mark covariance is 
normalized by the fluctuations [^3| : cov(r) / var(r) . 

These conditional mark correlation functions can be calculated from only 

three independent pair averages [5l| |: (to) p (r), {mim^p (r), and ^m 2 ^ p (r). 

Thus the above mentioned characteristics are not independent, e.g. var(r) = 

j(r) + cou(r). 

We apply these m ark correlation functions to the gala xy dis tributio n in Sec- 



tion 



1.2.1 



(Figure 1.3), to Martian craters in Sect ion 1.2.3 (Figure 1.7 ) and 



1.2.4 



to pores in sandstones considered in Section 
Also vector- valued information 1^, describing, e.g., the orientation of an 
anisotropic object at position Xj may be available. It is therefore interesting 
to consider vector marks such as done by |^,^,^0) who use a mark corre- 
lation function to quantify the alignment of vector marks. Here we suggest 
three mark correlation functions quantifying geometrically different possi- 
bilities of an alignment. In order to ensure coordinate-independence of our 
descriptors, we focus on scalar combinations of the vector marks in using the 
scalar product • and the cross product x . Different from the case of scalar 
marks, it is a non-trivial task to find a set of vector-mark correlation func- 
tions which contain all possible information (at least up to a fixed order in 
mark space) . We provide a systematic account of how to construct suitable 
vector-mark correlation functions in a complete and unique way for general 
dimensions in the Appendix. 

Here we only cite the most important results. For that we need the distance 
vector between two points, r = xi — x 2 , the normalized distance vector, 
f = r/r, and the normalized vector mark: 1^ = li/U with 2, = |lj|. The 
following conditional mark correlation functions will be used to quantify 
alignment effects: 
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(a) A{r) quantifies the Alignment of the two vector marks li and I2: 

A(r) = i (h • l 3 )p (r) . (1.14) 

It is proportional to the cosine of the angle between li and I2. We nor- 
malize with the mean 7. For purely independent vector marks A(r) is 
zero, whereas A(r) > means that the marks of pairs separated by r 
tend to align parallel to each other. - In some applications, e.g. for the 
orientations of ellipsoidal objects, the vector mark is only defined up to 
a sign, i.e. 1 and —1 mean actually the same. In this case the absolute 
value of the scalar product is useful: 

A'(r) = ^{\h-h\) P (r) . (1.15) 

For uncorrelated random vectors we get A'(r) = 1/2. A and A' can read- 

1 r( — ) 

ily be generalized to any dimension d, where we expect A = 7r~ 2 d ^_[ N 



r(2$±) 

for uncorrelated random orientations. In two dimensions A' is propor- 
tional to kd as defined by |R| . 

(b) T{r) quantifies the .Filamentary alignment of the vectors li and I2 with 
respect to the line connecting both halo positions: 

FW^^dh-fi + ib.fDpW, (Lie) 

J-[r) is proportional to the cosine of the angle between li and the distance 
vector f connecting the points. For uncorrelated random vector marks, 
we expect again J-(r) = 1/2; T(r) becomes larger than that, whenever 
the vector marks of the objects tend to point to objects separated by r 
- an example is provided by rod-like metallic grains in an electric field: 
they concentrate along the field lines and orient themselves parallel to 
the field lines. 

(c) V(r) quantifies the Planar alignment of the vectors and the distance 
vector. V(r) is proportional to the volume of the rhomb defined by li, I2 
and f : 




, I2 x f 
n 



\h x f I 



ll X f 

h ■ 



111 x f I 




(1.17) 



Quite obviously, this quantity can not be generalized to arbitrary dimen- 
sions; the deeper reason for that will become clear in the Appendix. - 
We get V(r) — 1/2 for randomly oriented vectors, whereas it is becoming 
larger for the case that I2 is perpendicular to li as we ll as to f . 
Applications of vector marks can be found in Section 1.2.2] (Figure |l.4| ) where 



we consider the orientation of dark matter halos in cosmological simulations. 
But one can think of other applications: mark correlation functions may serve 
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as orientational order parameters in liquid crystals in order to discriminate 
between nemetic and smectic phases (see the contribution by F. Schmid and 
N. H. Phuong in this volume). They can also quantify the local orientation 
and order in liquids such as the recently measured five-fold local symmetry 
found in liquid lead J50| . As a further application one could try to measure the 
signature of hexatic phases in two-dimensional colloidal dispersions and in 
2D melting scenarios occurring in experiments and simulations of hard-disk 
systems (for a review on hard sphere models see |3^] . Finally, the orientations 
of anisotropic channels in sandstone (see the contribution by C. Arns et al. 
in this volume) are relevant for macroscopic transport properties, therefore 
their quantitative characterization in terms of mark correlation functions 
might be interesting. 

Before we move on to applications a few general remarks are in order: First, the 
definition of these mark characteristics based on the conditional density A^O) 
leads to ambiguities at r equal zero as discussed by |3l| , but there is no problem 
for r > 0. - Furthermore, suitable estimators for our test quantities are based 
on estimators for the usual two-point correlation function |o^ , [l3| |7fl . 
Mark-dependent clustering can also be defined at any n-point level. Mark-inde- 
pendent clustering at every order is called the random labelling property jl^] . 
Mark correlation functions based on the n-point den sitie s may be used. For 



discrete marks the multivariate J- functions (see Eq. (F5)) are an interesting 
alternative, sensitive to higher-order correlations. The random labelling property 
then leads to the relation 

Mr) = J, (1.18) 

which may be used as a test [p9[ . 



1.2 Describing empirical data: some applications 

In many cases already the question whether one or the other type of dependence 
as outlined above applies to certain data sets is a controversial issue. In the 
following we will apply our test quantities to a couple of data sets in order 
to probe whether there is an interplay between some objects' marks and their 
positions in space. Other applications to biological, ecological, mineralogical, 
geological data can be found in | |57| , |60| , p3|p0[ | . 



1.2.1 Segregation effects in the distribution of galaxies 

The distribution of galaxies in space shows a couple of interesting features and 
challenges theoretical models trying to understand cosmological structure for- 
mation (see e.g. [|4|). There has been a long debate, whether and how strongly 
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the clustering of galaxies depends on their luminosity and their morphological 
type (see, e.g. p8| , |30|]27| ] ) . The methods which have been used so far to establish 
such claims were based on the spatial two-point correlation function; it was esti- 
mated from different subsamples that were drawn from a catalogue and defined 
by morphology or luminosity. However, some authors claimed that the signal 
of luminosity segregation observed by others was a spurious effect, caused by 
inhomogeneities in the sample and an inadequate choice of the statistics Q . 
could s how th at methods based on the mark-correlation functions, as discussed 
in Sect. |l.l.4| , are not impaired by inhomogeneities, and found a clear signal of 
luminosity and morphology segregation. 

In order to quantify segregation effects in the galaxy distribution we consider the 
Southern Sky Redshift Survey 2 (SSRS 2, Q), which maps a significant fraction 
of the sky and provides us with the angular sky positions, the distances (deter- 
mined via the rcdshifts), and some intrinsic properties of the galaxies such as 
their flux and their morphological type. As marks we consider either a galaxy's 
luminosity estimated from its distance and flux, or its morphological type. In the 
latter case we effectively divide our sample into early-type galaxies (mainly ellip- 
tical galaxies) and late- type galaxies (mainly spirals). In order to analyze homo- 
geneous samples, we focus on a volume-limited sample of 100/i _1 Mpc depthQ 0. 



In a first step we ask whether the early- and the late- type galaxies form indepen- 



dent sub-processes. In Figure 1.2 we show J e i as function of the distance r being 
far away from the value of one. Recalling Eq. ([0]), we conclude that the mor- 
phological types of galaxies are not distributed independently on the sky. Not 
surprisingly, the inequality J e i < 1 indicates positive interactions between the 
galaxies of both morphological types; indeed galaxies attract each other through 
gravity irrespective of their morphological types. 

After having confirmed the presence of interactions between the different types of 
galaxies, we tackle the issue whether the clustering of galaxies is d ifferent for dif- 



ferent galaxies. We consider the luminosities as marks (see Fig. LI). In Figure F3 
we show some of the mark-weighted conditional correlation functions. Already 
at first glance, they show evidence for luminosity segregation, relevant on scales 
up to 15/i _1 Mpc. To strengthen our claims, we redistribute the luminosities of 
the galaxies within our sample randomly, holding the galaxy positions fixed. In 
that way we mimic a marked point process with the same spatial clustering and 
the same one-point distribution of the luminosities, but without luminosity seg- 
regation. Comparing with the fluctuations around this null hypothesis, we see 
that the signal within the SSRS 2 is significant. 

The details of the mark correlation functions provide some further insight into 
the segregation effects. The mean mark k m (r) > 1 indicates that the luminous 

1 One Mpc equals roughly 3.26 million light years. The number h accounts for the 
uncertainty in the measured Hubble constant and is about h « 0.65. Volume-limited 
samples are defined by a limiting depth and a limiting luminosity. One considers only 
those galaxies which could have been observed if they were located at the limiting 
depth of the sample. 
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2 4 6 8 10 12 



r [Mpc/h] 

Fig. 1.2. The J e i function of early-type (e) and late-type (1) galaxies vs. the galaxy 
separation r in a volume-limited sample of 100/i _1 Mpc depth from the SSRS 2 cata- 
logue. 

galaxies are more strongly clustered than the dim ones. Our signal is scale- 
dependent and decreasing for higher pair separations. The stronger clustering of 
luminous galaxies is in agreement with earlier claims comparing the correlation 
amplitude of several volume- limited samples Q . 

The var(r) being larger than the mark variance of the whole sample, a 2 M , shows 
that on galaxy pairs with separations smaller than 15/i _1 Mpc the luminosity 
fluctuations are enhanced. The fact that the mark segregation effect extends to 
scales of up to 15/i _1 Mpc is interesting on its own. In particular, it indicates that 
galaxy clusters are not the only source of luminosity segregation, since typically 
galaxy clusters are of the size of 3/i _1 Mpc. 

The signal for the covariance ccw(r), however, could be due to galaxy pairs inside 
clusters. It is relevant mainly on scales up to 4/i _1 Mpc indicating that the lu- 
minosities on galaxy pairs with small separations tend to assume similar values. 
- Our results in part confirm claims by |)| , who compared the correlation func- 
tions £2 for different volume-limited subsamples and different luminosity classes 
of the SSRS 2 catalog (see also §). 
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Fig. 1.3. The luminosity-weighted correlation functions for a volume-limited subsam- 
ple of the SSRS 2 with a depth of 100ft _1 Mpc. The shaded areas denote the ranee of 
one-a fluctuations for randomized marks around the case of no mark segregation. The 
fluctuations were estimated from 1000 reshufflings of the luminosities. 

1.2.2 Orientations of dark matter halos 

Many structures found in the Universe such as galaxies and galaxy clusters 
show anisotropic features. Therefore one can assign orientations to them and 
ask whether these orientations are correlated and form coherent patterns. Here 
we discuss a similar question on the base of numerical simulations of large scale 
structure (e.g., |Io| , |36| ). 

In such simulations the trajectories of massive particles are numerically inte- 
grated. These particles represent the dominant mass component in the Universe, 
the dark matter. Through gravitational instability high density peaks ("halos") 
form in the distribution of the particles; these halos are likely to be the places 
where galaxies originate. In the following we will report on alignment correlations 
between such halos p2[ , for a further application of mark correlation functions 
in this field see |2q] . 

The halos used by p2| stem from a iV-body simulation in a periodic box with 
a side length of 500/i _1 Mpc. The initial and boundary conditions were fixed 
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according to a vlCDM cosmology (for a discussion of cosmological models see 
[EsUl^]). Halos were identified using a friend-of- friends algorithm in the dark 
matter distribution. Not all of the halos found were taken into account; rather 
the mass range and the spatial number density of the selected halos were chosen 
to resemble the properties of observed galaxy clusters in the Reflex catalogue 
[ fL2| . Typically our halos show a prolate distribution of their dark matter parti- 
cles. 

For each halo the direction of the elongation is determined from the major axis 
of the mass-ellipsoid. This leads to a marked point set where the orientation lj 
is attached to each halo position Xj as a vector mark with |lj| = 1. Details can 
be fou nds in [p2[ . 

In Fig. 1.4 the vector-mark correlation functions as defined in Eqs. ( 1.14 ), ( 1.1 6| ) , 
and ( 1.17 ) are shown. Since on ly th e orientation of the mass ellipsoids can be 
determined, we use A'(r) (Eq. ) instead of A(r). The signal in A'(r) indi- 
cates that pairs of halos with a distance smaller than 30/i -1 Mpc show a ten- 
dency of parallel alignment of their orientations Ii, I2. The deviation from a pure 
random alignment is in the percent range but clearly outside the random fluctu- 
ations. The alignment of the halos' orientations li, I2 with the connecting vector 
f quantified by T{r) is significantly stronger; it is particularly interesting that 
this alignment effect extends to scales of about 100/i -1 Mpc. 
In a qualitative picture this may be explained by halos aligned along the fila- 
ments of the large scale structure. Indeed such filaments arc prominent features 
found in the galaxy distribution |}2| and in iV-body simulations jllj , often with 
a length of up to 100/i _1 Mpc. The lowered V{r) indicates that the volume of 
the rhomboid given by li , I2 and f is reduced for halo pairs with a separation 
below 80/i _1 Mpc. Already a preferred alignment of li, I2 along f leads to such a 
reduction, similar to a plane- like arrangement of li , I2, f. For the halo distribu- 
tion the signal in V(r) seems to be dominated by the filamentary alignment. 
The question whether there are non-trivial orientati on p atterns for galaxies or 
galaxy clusters has been discussed for a long time. |llj reported a significant 
alignment of the observed galaxy clusters out to 100ft. _1 Mpc. [ |62|j63l, h owever 
claimed that this effect is small and likely to be caused by systematics; 67j find no 
indication for alignment effects at all. Subsequently several authors purported 
to have found signs of alignments i n th e galaxy and galaxy cluster distribu- 
tion (see e.g. pl| , p7| , ^j2Stl ) ■ Our Fig. L4 shows that from simulations significant 
large-scale correlations are to be expected in the orientations of galaxy clusters, 
in agreement with the results b y [fllfl . These results are also supported by a 
simulation study carried out by |4q| . 



1.2.3 Martian Craters 

Let us now turn to ano ther, still astrophysical, but significantly closer object: 
the Mars (see Figure L5). Many planets' surfaces display impact craters with di- 
ameters up to ~ 260 km and a broad range of inner morphologies. These craters 
are surrounded by ejecta forming different types of patterns. The craters and 
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Fig. 1.4. The correlations of halo orientations in numerical simulations. The orientation 
of each dark matter halo, specified by the direction of the major axis 1 of the mass 
ellipsoid, is used as a vector mark. The dashed area is obtained by randomizing the 
orientations among the halos. 

their ejecta are likely to be caused by asteroids and periodic comets crossing 
the planets' orbits, falling down onto the planet's surface, and spreading some 
of the underlying surfaces material around the original impact crater. A vari- 
ety of different crater morphologies and a wide range of ejecta patterns can be 
found. In principle, either the different impact objects (especially their energies) 
or the various surface types of the planet may explain the repertory of pat- 
terns observed. Whereas the energy variations of impact objects do not cause 
any peculiarities in the spatial distribution of the craters (apart from a possible 
latitude dependence), geographic inhomogeneities are expected to originate in- 
homogeneities in the craters' morphological properties. 

We try to answer the question for the ejecta patterns' origin using data collected 
by H who already found correlations between crater characteristics and the lo- 
cal surface type employing geologic maps of the Mars. Complementary to their 
approach, we investigate two-point properties without any reference to geologic 
Mars maps. We restrict ourselves only to craters which have a diameter larger 
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Fig. 1.5. The Martian surface with its craters. Whe reas the left panel (from 
tittp://pds.jpl. nasa.gov/planets/captions/mars/schiap. htm) illustrates the various ge- 
ological settings to be found on the planet's surface, the other panels focus on a small 
patch and show the craters together with their radii (middle panel, the size of the sym- 
bols are proportional to the radii of the craters) and together with the craters' types 
(right panel, simple morphology as quadrangles and more complex craters as stars). 
The latter viewgraphs rely on the data by ||. 



than 8 km and whose ejecta pattern could be classified, ending up with 3527 
craters spread out all over the Martian surface. We use spherical distances for 
our analysis of pairs. 

In a first step we divide the ejecta patterns into two broad classes consisting of 
either the simple patterns (single and double lobe morphology, i.e. SL and DL in 
terms of the classification by ; we speak of "simple craters" ) or the remaining, 
more complex configurations ("complex craters"). Using our conditional cross 



correlation functions Cy as defined in Equation (1.8), we see a highly signifi- 
cant signal for mark correlations (Figure 1.6). At small separations, crater pairs 
are disproportionally built up of simple craters at the expense of cross corre- 
lations. This can be explained assuming that crater formation depends on the 
local surface type: if the simple craters are more frequent in certain geological 
environments than in others, then there are also more pairs of them to be found 
as far as one focuses on distances smaller than the typical scale of one geological 
surface type. Cross pairs are suppressed, since typical pairs with small separa- 
tions belong to one geological setting where the simple craters either dominate or 
do not. Only a small, positive segregation signal occurs for the complex craters. 
Hence our analysis indicates that the broad class of complex craters is distributed 
quite homogeneously over all of the geologies. On top of this there are probably 
simple craters, their frequency significantly depending on the surface type. 
If the ejecta patterns were independent of the surface, no mark segregation could 
be observed (other sources of mark segregation are unlikely, since the Martian 
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craters are a result of a long bombardment history diluting any eventual pecu- 
liar crater correlations). In this sense, the signal observed indicates a surface- 
dependence of crater formation. This result is remarkable, given that we did not 
use any geological information on the Mars at all. The picture emerging could 
be described using the random field model, where a field (here the surface type) 
determines the mark of the points (see below). 




500 1000 1500 aooo 
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Fig. 1.6. The conditional cross-correlation functions for Martian craters. We split 
the sample of craters into two broad classes according to their ejecta types: simple 
morphologies (S) consisting of SL and DL types, and complex morphologies (C) with 
all other types (see |j| for details). The results indicate, that at scales up to about 1500 
km the clustering of the simple craters is enhanced at expense of cross correlations. 
The shaded areas denote the one-cr fluctuations for randomized marks estimated from 
100 realizations of the mark reshuffling. 



In a second step, we analyze the interplay between the craters' diameters and 
their spatial clustering. Now the diameter serves as a continuous mark. The re- 
sults in Figure 1.7 show a clear signal for mark segregation in k m and cov at 
small scales. The latter signals that pairs with separations in a broad range up 
to 1700 km tend to have similar diameters; this is in agreement with the earlier 
picture: as H showed, the simple craters are mostly small-sized. Pairs with rel- 
atively small separations thus often stem from the same geological setting and 




Fig. 1.7. The radius-weighted correlation functions for craters on Mars. The radius of 
each crater serves as a mark, r is the spherical distance. The shaded areas denote the 
one-a fluctuations for randomized marks estimated from 100 realizations of the mark 
reshuffling. 



therefore have similar diameters and similar morphological type. 

Also the signal of k m seems to support this picture: since the simple craters 

are more strongly clustered than the other ones and since they have smaller 



diameters, one could expect k m < 1. As we shall see in Sect. |1.3| , however, a 
k m ^ 1 contradicts the random field model; therefore, the mark-dependence on 
the underlying surface type (thought of as a random field) cannot account for 
the signal observed. Thus, we have to look for an alternative explanation: it 
seems reasonable, that, whenever a crater is found somewhere, no other crater 
can be observed close nearby (because an impact close to an existing crater will 
either destroy the old one or cover it with ejecta such that it is not likely to 
be observed as a crater) . This results in a sort of effective hard-core repulsion. 
This repulsion should be larger for larger craters. Thus, pairs with very small 
separations can only be formed by small craters, therefore k m < 1 for tiny r. 
The scale beyond which k m {r) ~ 1 should somehow be hidden within the crater 
diameter distribution. Indeed, at about 500 km the segregation vanishes, which 
is about twice the largest diameter in our sample. Taking into account that the 
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ejecta patterns extend beyond the crater, this seems to be a reasonable agree- 
ment. As shown in Sect. 1.3.1 a model based on these consideration is able to 
produce such a depletion in the k m (r). This effect could also in turn explain part 
of the cross correlations observed earlier in Figure 1.6. A similar effect is to be 
expected for the mark variance. Close pairs are only accessible to craters with a 
smaller range of diameters; therefore, their variance is diminished in comparison 
to the whole sample. However, an effect like this is barely visible in the data. 
Altogether, the crater distribution is dominated by two effects: the type of the 
ejecta pattern and the crater diameter depend on the surface, in addition, there 
is a sort of repulsion effect on small scales. 



1.2.4 Pores in Sandstone 




Fig. 1.8. The pores within a Fontainbleau sandstone sample. Note, that this is a neg- 
ative image, where the pores are displayed in grey. The geometrical features of the 
pore network are important for macroscopic properties of the stone. In this sample the 
pores occupy 13% of the volume. The size of the whole sample shown is about 1.5 mm 3 
(Courtesy M. Knackstedt). 



Now we turn to systems on smaller scales. Sandstone is an example of a porous 
medium and has extensively been investigated, mainly because oil was found in 
the pore network of similar stones. In order to extract the oil from the stone one 
can try to wash it out using a second liquid, e.g. water. Therefore, one tries to 
understand from a theoretical point of view, how the microscopic geometry of the 
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pore network determines the macroscopic properties of such a multi-phase flow. 
Especially the topology and connectivity of the microcaves and tunnels prove 
to be crucial for the flow properties at macroscopic scales. Details are given, for 
instance, in the contributions by C. Arns et al., H.-J. Vogel et al. and J. Ohser 
in this volume. A sensible physical model, therefore, in the first place has to rely 
on a thorough description of the pore pattern. 

One way to understand the pore network is to think of it as a union of simple 
geometrical bodies. Following J53|], one can identify distinct pores together with 
their position and their pore radius or extension. This allows us to understand 
the pore structure in terms of a marked point process, where the marks are the 
pore radii. 

In the following, we consider three-dimensional data taken from one of the 
Fontainbleau sandstone samples through synchrotron X-ray tomography. These 
data trace a 4.52 mm diameter cylindrical core extracted from a block with bulk 
porosity tfi = 13%,, where the bulk porosity is the volume fraction occupied by 
the pores. A piece with 2.91mm length (resulting in a 46.7 mm 3 volume) of the 
core was imaged and tomographically reconstructed p3|j5^ ,p|,p| . Further details 
of this sample are presented in the contribution by C. Arns et al. in this volume. 
Based on the reconstructed images the positions of pores and their radii were 
identified as described in f53| |. 

In our results for the mark correlation functions a strong depletion of k m {r) 
and var(r) is visible for r < 200//m in Fig. 1.1C. This small-scale effect may be 
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Fig. 1.9. The empirical one-point distribution Mi of the pore sizes. 



explained similarly to the Martian craters: large pores are never found close to 
each others, since they have to be separated by at least the sum of their radii. 
The histogram of the pore radii in Fig. 1.9 shows that most of the pores have 



radii smaller than 100/im, and consequently this effect is confined to r < 200/^m. 
In Sect. 1.3.1 we discuss the Boolean depletion model which is based on this 
geometric constraints and is able to produce such a reduction in the k rn (r) . This 
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purely geometric constraint also explains the reduced var(r) and increased co- 
variance cov(r). For separations larger than 200/im there is no signal from the 
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Fig. 1.10. The mark-weighted correlation functions from the holes in the Fontainbleau 
sandstone. The pores' radii serve as marks. The k m being smaller than one indicates 
a depletion effect. The shaded areas again denote the range of one-a fluctuations for 
randomized marks around the case of no mark segregation. The fluctuations were 
estimated from 200 reshufflings of the radii. 

covariance, but both k m (r) and var{r) show a small increase out to ~ 1000/im. 
This indicate that pairs of pores out to these separations tend to be larger in 
size and show slightly increased fluctuations. However, this effect is small (of the 
order of 1%) and may be explained by the definition of the holes, which may 
lead to "artificial small pores" as "bridges" between larger ones. This hypothesis 
has to be tested using different hole definitions. In any case the main conclusion 
seems to be that apart from the depletion effect at small scales there are no 
other mark correlations. 
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1.3 Models for marked point processes 

Given the significant mark correlations found in various applications, one may 
ask how these signals can be understood in terms of stochastic models. A thor- 
ough understanding of course requires a physical modeling of the individual 
situation. There are, however, some generic models, which we will focus on in 



the following: in Sect. 1.3.1 we introduce the Boolean depletion model, which 
is able to explain some of the features observed in the distribution of craters 
and pores in sandstone. Another generic model is the random field model where 



the marks of the points stem from an independent random field (Section 1.3.2f ) 



In Sect. 1.3.3 we generalize the idea behind the random field model further in 
order to get the Cox random field model, which allows for correlations between 
the point set and the random field. Other model classes and their applications 
are discussed by e.g. Q^^gf^. 



1.3.1 The Boolean depletion model 

In our analysis of the Martian craters and the holes in sandstone, we found that 
for small separations only small craters, or small holes in the sandstone, could 
be found. We interpreted this as a pure geometric selection effect. The Boolean 
depletion model is able to quantify this effect, but also shows further interesting 
features. 

The starting point is the Boolean model of overlapping spheres -Br(x) (see also 
the contributions by C. Arns et al. and D. Hug in this volume as well as |5(|). 
For that, the spheres' centers Xj are generated randomly and independently, i.e. 
according to a Poisson process of number density qq. The radii R of the spheres 
are then chosen independently according to a distribution function Fq(R), i.e. 
with probability density fo(R) = 9F q^ ■ The main idea behind the depletion is 
to delete spheres which are covered by other spheres. To make this procedure 
unique we remove only those spheres which are completely covered by a (notably 
larger) sphere^. The positions and radii of the remaining spheres define a marked 
point process. Note, that this depletion mechanism is minimal in the sense that 
a lot of overlapping spheres may remain. This Boolean depletion model may be 
considered as the low-density limit of the well-known Widom-Rowlinson model, 
or (more generally) of non- additive hard sphere mixtures (see |7^ , [40| , j39[ ) . 
The probability that a sphere of radius R is not removed is then given by 



4tt {Ri - Rf 



exp 



-Q$uj d J dx f (R + x)x d ^j , 



12 



This process can be thought of as a dilution of the orig inal Poisson process, for some 
general remarks on diluting Poisson processes see ]5q| , p. 163. A comparable model 
was considered by [pST. 
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with the step function 0(x) = for x < 1 and &{x) = 1 otherwise, and the 
volume of the d-dimcnsional unit ball ujd (u>i = 2, oj 2 = ft, lo^ = 4ar/3). The limit 
in Eq. (1.19) is performed by keeping go — N/\f2\ constant, with N the initial 
number of spheres and \ fi\ the volume of the domain. 
The number density of the remaining spheres reads 



g = g f dR f (R)f nr (R) , (1.20) 
Jo 

where the one-point probability density M.\{R) that a sphere has radius R is 
given by 

M!(R) = fo(R)f nT (R)-. (1.21) 
Q 

The probability that one or both of the spheres -Br^Xi) and Br 2 (x 2 ) are not 
removed is given by 

f f a 1° iir<\R 2 -Ri\, n 

f BI {xi,Ri;x 2 ,R 2 ) = < (1.22) 
J^exp (-go.gm (xi, Ri; x 2 , R 2 )j otherwise , 

with -Br<o(x) = and 

/>oo 

g m (x u R 1 ;x 2) R 2 ) = dx V(B x _ fll (xi) U £ x _ fl2 (x 2 )) / (as). (1.23) 
Jo 

At this point we have to consider the set union of two spheres with radii 61 = 
x — Rx and b 2 = x — R 2 , respectively; the volume of this geometrical configuration 
can be calculated; in three dimensions , e.g., we have: 

V(B bl ( Xl ) U B b2 (x 2 )) = y (&? + b 3 2 ) (1.24) 

_ ?HL ( T l _ 3 r(6 2 b2) _ 1 (& 2 _ fe 2 )2 

3 V 8 4 v 1 21 8r y 2 2J 

for 1 62 — 61 1 < r = |x2 — X2I < b\ + 62- Otherwise this volume reduces either to 
the volume of the larger sphere (r < \b 2 — b\\) or to the sum of both spherical 
volumes (r > b\ + 62). 

Similarly as in Eq. (1.20) the spatial two-point density turns out to be 

/>oo />oo 

ef(xi,x 2 ) = e§ / di? x / di? 2 /o(i?i)/o(i?2)/nr(xi,i?i;x 2 ,i?2) , (1.25) 
Jo Jo 

such that the conditional two-point mark density simply reads 

A4 2 (i?i,i? 2 |x 1 ,x 2 ) =/o(i?i)/o(i?2)/nr(xi,i?i;x 2 ,i?2)^5 7 ^ (1.26) 

^2 (X1,X 2 ) 

From this we can derive all of the mark correlation functions from Sect. |1.1.4 
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A bimodal distribution: In order to get an analytically tractable model we 
adopt a bimodal radius distribution in the original Boolean model and start 
therefore with 

f (R)=a 5(R-R 1 ) + (l-ao)5(R-R 2 ) , (1.27) 

where we assume that R\ < R 2 . Due to the depletion the number density g of 
the spheres as well as the probability a to find the smaller radius R 2 at a given 
point are then lowered; we get 

e~™ 

a = a - ■ — - < a , (1.28) 

1 — ao + a$e n 

g= Qo(l- a Q + a e ) = Qo- (1.29) 

1 — a 

with n — g(l — a)^-{R 2 — Ri) 3 - Altogether, the bimodal model can be parame- 
terized in terms of the radii R\ R 2 , the ratio ao G [0, 1] and the density qq G R + . 
The latter two quantities, however are not observable from the final point pro- 
cess, therefore we convert them into the parameters a G [0, 1] and g G R + , 
so that all other quantities can be expressed in terms of these, for instance, 

a ° = a +q - a a k- " - a ' and £o = e a ° n + et 1 ~ a )' 



From Eq. (1.21) we determine the mean mark, i.e. the mean radius of the spheres 

m = R = aR 1 + {l-a)R 2 , (1.30) 



and from Eq. (1.25) the spatial product density 

'(1 -a) 2 R 2 +a 2 R 1 e^p(nI(x)) < x < 1, 
Q2 (r) = Q 2 { 1 + a 2 [exp(nl(x)) - 1] 1 < x < 2, (1.31) 

1 2 < x, 

with the normalized inter-sectional volume I(x) = 1 — jX + jqX 3 of two spheres 
and x — . Finally, using Eq. ( 1.26| ) one can calculate the mark correlation 



functions, e.g. 

Mr) = { 1 - n-( 1 - . Jy - i + :gg(x))-ti 1 2 { L:V2) 



a 2 {l - a) R *- R i ^S^imz^l+L. 0< X <1, 



2 < x. 



In Fig. 1.11 the k m (r) function from the Boolean depletion model is shown. The 
model with the solid line illustrates that a reduced k m (r) for small radii can be 
obtained by simply removing smaller spheres. At least qualitatively this model is 
able to explain the depletion effects we have seen both in the distribution of Mar- 
tian craters (Fig. [O]) and in the distribution of pores in sandstone (Fig. 1 . 1 0| ) . 



The jump at r — R 2 — R\ is a relict of the strictly bimodal distribution with 
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only two radii. Fig. 



1.11 also shows that the Boolean depletion model is quite 
(r) < 1, 



flexible, allowing for a k m (r) < 1, but also k m (r) > 1 is possible. 
Without ignoring the considerable difference of this Boolean depletion model to 
the pore size distribution in real sandstones (see Figures 1.8-1.10) one may still 
recognize some interesting similarities: This simple model explains naturally a 
decrease of k m (r) if the distribution of the radii is symmetric (a = 1/2). As 
visible in Figure |1.9| this is approximately the case for the pore radii. Moreover, 
note that even quantitative feature s are captured correctly indicating that the 
decrease of k m (r) visible in Figure 1.10 is indeed due to a depletion effect. For 
instance, the decrease starts at r « Rm where Rm ~ lOO^m is the largest oc- 
curring radius (see the histogram in F igure [l~9| ) and the value of k m (0) ss 0.8 at 
r = is in accordance with Equation ( 1.32 ) assuming that i?2 — Ri ~ R and the 
normalized density of pores nwl necessary for a con necte d netw ork. Of course 
a more detailed analysis is necessary based on Eqs. ( [1.21 ) and ( 1.26 ) and the 
histogram shown in Figure [L9. 




Fig. 1.11. The k m (r) function for the Boolean depletion model with parameters Ri = 
0.05, Ri = 0.15, g = 500, and a = 0.5 (solid line), a = 0.3 (dotted line), a = 0.1 
(dashed line). 



1.3.2 The random field model 



The "random-field model" covers a class of models motivated from fields such as 
geology (see, e.g., |7Cj). The level of the ground water, for instance, is thought 
of as a realization of a random field which may be directly sampled at points 
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(hopefully) independent from the value of the field or which may influence the 
size of a tree in a forest. 

In general, a realization of the random field model is constructed from a real- 
ization of a point process and a realization of a random field it(x). The mark of 
each object located at traces the accompanying random field via = it(xi). 
The crucial assumption is that the point process is stochastically independent 
from the random field. 

We denote the mean value of the homogeneous random field by u = E[u(x)] = u 1 
and the moments by u k — J du w(u)u k , with the one-point probability density 
w of the random field and E the expectation over realizations of the random 
field. The product density of the random field is p 2 (r) = ^[ u ( x i) M ( x 2)] with 
r = |xi — x 2 |. For a general discussion of random field models, see jlj. 
In this model the one-point density of the marks is Mi (m) = w(m), and m k = u k 
etc. The conditional mark density is given by 

yVf 2 (mi, m 2 |xi, x 2 ) = E[<5(toi - u(x 1 ))S(m 2 - u(x 2 ))] , (1.33) 

where S is the Dirac delta distribution. Clearly, this expression is only well- 



defined under a suitable integral over the marks. With Eq. (1.7) one obtains 

(m 1 ) P (r)=% (m?) p (r) = t?, (m 1 m 2 ) p (r) = p%(r), (1.34) 

and the mark-correlation functions defined in Sect. |l.l.4| read 

k m (r) = 1, k rnm (r) = p 2 (r)/u 2 , 7(7-) = u 2 - p% (r), 

cov(r) = P2 (r) — u 2 , var(r) = u 2 — u 2 = a 2 M . (1.35) 

Therefore, there are some explicit predictions for the random field model: an em- 
pirically determined k m significantly differing from one not only indicates mark 
segregation, but also that the data is incompatible with the random field model. 



Looking at Figure 1.3 we see immediately that the galaxy data are not consis- 
tent with the random field model. Similar tests based on the relation between 
kmra and the mark-variogram 7 were investigated by {ftj and Q . The failure 
of the random field model to describe the luminosity segregation in the galaxy 
distribution allows the following plausible physical interpretation: the galaxies 
do not merely trace an independent luminosity field; rather the luminosities of 
galaxies depend on the clustering of the galaxies. We shall try to account for 
this with a better model in the following section. 



1.3.3 The Cox random field model 



In the random field model, the field was only used to generate the points' marks. 
In the Cox random field model, on the contrary, the random field determines 
the spatial distribution of the points as well. As before, consider a homogeneous 
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and isotropic random field u(x) > 0. The point process is constructed as a Cox- 
process (see e.g. ^8|). The mean number of points in a set B is given by the 
intensity measure 

A{B) = I dx a u(x), (1.36) 
Jb 

where a is a proportionality factor fixing the mean number density g = au. The 
(spatial) product density of the point distribution is 

ef(x 1)X2 )=a 2 p 2 l (r)=a 2 u 2 (l+$(r)), (1.37) 

where again p 2 (r) denotes the product density of the random field. £ 2 i s the 
normalized two-point cumulant of the random field (see below). We will also 
need the n-point densities of the random field: 

dt(xi,... ,x n ) =E[u(xi)-..u(xi)]. (1.38) 

Like in the random field model, the marks trace the field, but this time rather in 
a probabilistic way than in a deterministic one: the mark on a galaxy located 
at Xj is a random variable with the probability density p(mi|u(xj)) depending 
on the value of the field u(xj) at x». This can be used as a stochastic model for 
the genesis of galaxies depending on the local matter density. 
In order to calculate the conditional mark correlation functions we define the 
conditional moments of the mark distribution given the value u of the random 
field: 

m k (u) = J dmp(m\u)m k . (1.39) 

The spatial mark product-density is 

gf M ((xi,mi), (x 2 ,m 2 )) = a 2 E[p(m 1 \u(x 1 ))p(m 2 \u(-x 2 )) u(xi)u(x 2 )] . (1.40) 



and with Eq. ( |l.3|) 

X 2 (TO 1 ,m 2 |x 1 ,x 2 ) = — — E[p(mi|u(xi))p(m 2 |u(x 2 )) u(x 1 )it(x 2 )l , (1.41) 
PW) 

for p 2 (r) ^ and zero otherwise. The mark correlation functions can therefore 
be expressed in terms of weighted correlations of the random field: 

(m) p (r) = -— rE[m(u(xi)) u(x x )u(x 2 )] , 
Pi \ r ) 

(™ 2 ) P W - ^y E [^("( x i)) «( x i)"( x 2)] , (i-42) 

(m 1 m 2 ) p (r) = — ^-E[m(u(xi))m(u(x 2 )) u(xi)ti(x 2 )] . 
P 2 v) 
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A special choice for p(m\u): To proceed further, we have to specify p{m\u). 
As a simple example we choose rrii equal to the value of the field u(xj) at the point 
Xj, such as in the random field model. Thinking of the random field as a mass 
density field and the mark of a galaxy luminosity, that means that the galaxies 
trace the density field and that their luminosities are directly proportional to 
the value of the field. With p(m\u) = S(m — u) the conditional mark moments 
become m k (u) = u k . The moments of the unconstrained mark distribution read 
m k — u fc+i/^ 5 anc i the three basic pair averages are 

N P (r) = ^£ ,X2) . (m?> p (r) - 



P u 2 (r) ' X i/pW fi(r) 

jO^(x i,Xi,X2 ,X 2 ) 



(m 1 m 2 ) P (r) = . (1.43) 



Hence, the mark correlation functions defined in Sect. |l.l.4| are determined by 
the higher-order correlations of the random field. With the Cox random field 
model we go beyond the random field model, e.g. 

Mr) = <"M r > = g^Xl.XLX^ 

is not equal to one any more. 



Hierarchical field correlations: At this point, we have to specify the corre- 
lations of the random field u(x). The simplest choice, a Gaussian random field, 
is not feasible here, since a number density (cp. Eq. 1.36Q has to be strictly posi- 
tive, whereas the Gaussian model allows for negative values. Instead, we will use 
the hierarchical ansatz: we first express the two- and three-point correlations in 
terms of normalized cumulants £2 and £3 (see, e.g., p9p|,|35[), 



p 2 1 (x 1) x 2 ) = u 2 (l + ^(x 1 ,x 2 ; 

^(xi.xa.xa) =¥ 3 (l + ^(x 1 ,x 2 )+^(x 2 ,X3)+^(x 1 ,x 3 )+^(x 1) x 2) x 3 ; 

(1.45) 

In order to eliminate £3 we use the hierarchical ansatz (see e.g. [|l7|): 

e3 U ( Xl ,X 2 ,X 3 ) = g(^( Xl ,X 2 )e 2 "(x 2 ,X 3 ) +#(X 2) X 3 )8(X 1J X3) 

+ ^(x 1 ,x a )Qf(x 1 ,x s )). (1.46) 



This ansatz is in reasonable agreement with data from the galaxy distribution, 
provided Q is of the order of unity Several choices for £ 2 ( r ) an d Q lead 

to well-defined Cox point process models based on the random field u(x) HMfl. 
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Now we can express k m (r) from Eq. (1.44) entirely in terms of the two-point 
correlation function ( r ) °f t ne random field: 



1 + 2#(r) + @(0) + Q(®{rf + 2#(r)@(0) 



(l + @(r))(l + @(0) 



(1.47) 



where we made use of the fact that a 2 = 



~u 2 



u 2 S,2 (0)- Inserting typical 



parameter s fou nd from the spatial clustering of the galaxy distribution we see 
from Fig. 1.12 that the Cox random field model allows us to qualitatively de- 
scribe the observed luminosity segregation in Fig.[L^. But the amplitude of k m 
predicted by this model is too high. The Cox random held model, however, is 
quite flexible in allowing for different choices ior p(m\u); also different models for 
the higher-order correlations of the random field may be used, e.g. a log-normal 
random field fl4|,|^| ■ Clearly more work is needed to turn this into viable model 
for the galaxy distribution. 
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Fig. 1.12. The k m (r) function for the Cox random field model according to Eq. ( 1.47 ). 
We use Q — 1 and £,%(r) = (5h~ 1 Mpc/r) 17 truncated on small scales at £%(r < 
O.lft-'Mpc) = al/u 2 = $(0) ~ 750. 
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1.4 Conclusions 

Whenever objects are sampled together with their spatial positions and some 
of their intrinsic properties, marked point processes are the stochastic models 
for those data sets. Combining the spatial information and the objects' inner 
properties one can constrain their generation mechanism and their interactions. 
Developing the framework of marked point processes further and outlining some 
of their general notions is thus of interest for physical applications. Let us there- 
fore look at mark correlations again from both a statistical and a physical per- 
spective. We focused on two kinds of dependencies. 

On the one hand, one can always ask, whether objects of different types "know" 
from each other. From a statistical point of view, this is the question whether 
the marked point process consists of two completely independent sub-point pro- 
cesses. Physically, this concerns the question whether the objects have been 
generated together and whether they interact with each other. 
On the other hand, it is often interesting to know whether the spatial distri- 
bution of the objects changes with their inner properties. For the statistician, 
this translates into the question whether mark segregation or mark-independent 
clustering is present. For the physicist such a dependency is interesting since 
one can learn from them whether and how the interactions distinguish between 
different object classes or whether the formation of the objects' mark depends 
on the environment. 

We discussed statistics capable of probing to which extent mark correlations are 
present in a given data set, and showed how to assess the statistical significance. 
Applying our statistics to real data, we could demonstrate, that the clustering 
of galaxies depends on their luminosities. Large scale correlations of the orienta- 
tions of dark matter halos were found. Using the Mars data we could validate a 
picture of crater generation on the Martian surface: mainly, the local geological 
setting determines the crater type. We also could show that the sizes of pores in 
sandstone are correlated. 

In order to understand empirical data sets in detail, we need models to compare 
to. As generic models the Boolean depletion model, the random field model and 
its extension, the Cox random field models are of interest. 

Further application of the mark correlations properties may inspire the devel- 
opment of further models. It seems therefore that marked point processes could 
spark interesting interactions between physicists and mathematicians. Certainly, 
the distributions of physicists and mathematicians in coffee breaks at the Wup- 
pertal conference were clustered, each. But could one observe positive cross- 
correlations? Using mark correlations we argue, that, even more, there is lots of 
space for positive interactions 
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Appendix: Completeness of mark correlation functions 



In order to form versatile test functions for describing mark segregation ef- 
fects, we integrated the conditional mark probability density M.2{m\, m,2\r) 
twice in mark space thereby weighting with a function of the marks /(mi, m.2) 
(see Eq. |l.7| ). Such a pair-averaging reduces the full information present in 
M.%{mx, m,2\r). So one may ask, whether or in which sense the mark correlation 
functions give a complete picture of the present two-point mark correlations. 
For scalar marks rrii this task is trivial. With a polynomial weighting function 
/(mi, 7712) ~ rrii 1 m.2 2 ( n ij n 2 =0,1,..) we consider moments of A^mi, malr), 
hence, we can be complete only up to a given polynomial order in the marks 
mi and m,2. At first order there is only the mean (m) p (r). At second order we 
have ( m 2 ) p ( r) and (mim.2)p (r). All the mark correlation functions discussed 
in Sect. 1.1.4 can be constructed from these three pair averages^. Higher-order 



moments of the marks involve more and more cross-terms. 
For vector-valued marks, however, it is not obvious that the test quantities pro- 
posed in Sect. |l.l.4| trace all possible correlations between the vectors up to third 
order. To settle this case we have to consider the framework of geometric alge- 
bra, also called Clifford algebra. A detailed introduction to geometric algebra is 
given in |3l| , shorter introductions are |26|,[38| . In geometric algebra one assigns 
a unique meaning to the geometric product (or Clifford product) of quantities 
like vectors, directed areas, directed volumes, etc. The geometric product ab of 
two vectors a and b splits into its symmetric and antisymmetric part 

ab = a b + aAb. (1.48) 

Here a • b denotes the usual scalar product; in three dimensions, the wedge 
product a A b is closely related to the cross product between these two vectors. 
However, aAb is not a vector like a x b, but a bivector - a directed area. Higher 
products of vectors can be simplified according to the rules of geometric algebra 
(for details see |3l|). 

Let us consider the situation where objects situated at Xi and X2 bear vector 
marks li and I2, respectively, and let the normalized distance vector be f = 
(xi — X2)/r. Note, that f is not a mark at all, rather it can be thought of as 
another vector which may be useful for constructing mark correlation functions. 

3 This completeness of (m 2 ) p (r) and {m\m2) p (r) at the two-point level, however, 
does not imply that one should not consider linear combinations of them. For in- 
stance, it may well be the case, that only certain linear combinations yield significant 
results. 
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For many applications it is reasonable to assume isotropy in mark space, i.e. 
all of the mark correlation functions are invariant under common rotations of 
the marks. For galaxies, e.g., there does not seem to be an a priori preferred 
direction for their orientation. In more detail we have then 

Mi(l) = Mi(ifl) = A<i(|l|) , 
M 2 (h,h\r)=M 2 (Rh,Rh\r) , 

and so on, where R is an arbitrary rotation in mark space. This means that the 
mark correlation functions depend only on rotationally invariant combinations 
of the vector marks. Therefore, only rotationally invariant combinations of vec- 
tors are sensible building blocks for weighting functions. We thus can restrict 
ourselves to scalar weighting functions, which result in coordinate-independent 
vector-mark correlation functions. 

Again we proceed by considering mixed moments as basic combinations. We re- 
strict ourselves to scalar quantities being polynomial in the vector components. 
One may also discuss moments in a broader sense allowing for vector moduli. 
In this wider sense, for example, |li| or |li x (l x x f)| would be allowed. We do 
not consider such quantities here, because they are not polynomial in the vec- 
tor components. Their squares anyway appear at higher orders. Furthermore, it 
turns out that the characterization we will provide depends on the embedding 
dimension. The first- and second-order moments are identical in two and three 
dimensions, but at the third order they start to differ. 



In the strict sense of scalar quantities being linear in the vector components 
there are no first-order moments for vectors. 

At second order we encounter the following products: ff , I1I2, rli . Note, 
that, e.g., lif and 12? do not make any difference as regards the mark correla- 
tion functions, since the pair averages implicitly render the indices symmet- 
ric; moreover, although the geometrical product is non-commutative, li A f 
and li A f do not lead to different mark correlation functions. Furthermore, 
ff = 1. I1I1 = li • li = l\ provides us with higher moments of the modulus of 
the vectors. To investigate these kinds of correlations already scalar marks 
would be sufficient. New information is encoded in the other products. 
Consider I1I2 = li • I2 + li A I2. The sym metric part li • I2 is clearly a scalar 
and defines the alignment A(r) (Eq. 1.14). The antisymmetric part I1AI2 is a 
bivector. Its - unique - modulus (see again |^l[), 1 1 1 A I2 1 = \J^}\ — (li • I2) 2 , 
may be useful, but is no longer a polynomial in the vector components, 
li A I2 1 2 appears at the fourth order. In a completely analogous way we can 
treat lif — \\ ■ f + li A f . The symmetric part li • f defines T(r). Hence at 
second order, the only possible vector-mark correlation functions are A(r) 
and F(r). 

At third order we have to consider products of three vectors. In general the 
product of three vectors a, b, c splits into 



abc = a(b • c) + (a • b)c - (a • c)b + a A (b A c). (1.49) 
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i.e., a vector (consisting of the three first terms), and a pseudo-scalar, a 
directed volume. In two dimensions the pseudo-scalar a A (b A c) vanishes. 
Now we have to form all possible products of the three vectors li, I2, ? and to 
derive scalars. In three dimensions the only new combination is the pseudo- 
scalar li A (I2 A f ) giving the oriented volume li • (I2 x f). Unfortunately, this 
oriented volume averages out to zero. Thus, in a strict sense, there are no 
interesting third-order quantities. Closely related, however, is the modulus 
of the pseudoscalar 1 1 x • (I2 x r) proportional to our V{r). This expression is 
invariant under permutations of the vectors. 
4. At third order and in two dimensions all of the relevant combinations are 
products of first- and second-order combinations; no specifically new combi- 
nation appears. This is different from the case of three dimensions, where at 
third order an entirely new geometric object, the pseudo-scalar li A (I2 A f ) 
can be constructed. There is a general scheme behind this argument: since 
in d dimensions any geometrical product of more than d vectors vanishes, 
all relevant combinations of vectors at orders higher than d are essentially 
products of combinations of lower-order factors. 
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